University of Sydney


The data revolution


Using data science to understand the social world

Using data science to understand the social world

  • What are the social sciences?
  • Theory.
  • Human behaviour as (at least in part) a stochastic process.
  • Specific issues encountered when studying human behaviour and the social world: difficulty running experiments, problems with self-reporting, use of survey weights.

The teaching team

  • Lecturer and Unit coordinator: Dr Shaun Ratcliff
  • Tutor and guest lecturer: Nick James, PhD in maths and statistics and CTDS
  • Guest lecturer, seminar 4-1, Dr Roman Marchant, Lecturer in Machine Learning and Data Mining, CTDS
  • Guest lecturer, seminar 4-2: Dr Richard Scalzo, Research Engineer in Data Science, Faculty of Engineering and IT, CTDS
  • Guest lecturer, seminar 5-2: Professor Simon Jackman, CEO of the US Studies Centre, Professor of Political Science

  • Seminar 1-1: Understanding the social world using data science
  • Seminar 1-2: Visualising social science data
  • Seminar 2-1: Confounding factors and human behaviour
  • Seminar 2-2: Understanding economic behaviour
  • Seminar 3-1: The probability of real world problems
  • Seminar 3-2: Predicting outcomes in the social world
  • Seminar 4-1: Non-linear problems in the social sciences
  • Seminar 4-2: Regularisation and variable selection
  • Seminar 5-1: Survey design
  • Seminar 5-2: Measuring latent variables in the social world
  • Seminar 6-1: Causality and spatial data
  • Seminar 6-2: Quantitative social science in the wild and Conclusions

  • Unit of Study (read this first)
  • Canvas (then this)
  • Lectures and labs (attend these)
  • Additional workshops (and these)
  • Additional tasks (not compulsory)
  • Readings (do them)
  • Assessments (these too)
  • Plagiarism (don't do it)
  • WHS (don't fall over please)

Operating in the R environment

  • Why \(R\)
  • Flexibility: programming our own functions, and packages
  • Visualisation

Operating in the R environment

pref.trends.cis <- 
  na.omit(ddply(anes.combined.data, .(year, party.id2), summarize, 
    confidence.intervals = c("Median", "Upper", "Lower"),   
    "Economic dimension"=wtd.median(eco.dimension, ftf.weights),
    "Racial dimension"=wtd.median(racial.dimension, ftf.weights),
    "Social dimension"=wtd.median(social.dimension, ftf.weights)))

Operating in the R environment

Operating in the R environment

vote.map <- leaflet() %>%
      addProviderTiles("CartoDB.Positron", 
                options= providerTileOptions(opacity = 0.99)) %>%
      addPolygons(data = us.shape.2, 
                stroke = FALSE, fillOpacity = 0.5, 
                smoothFactor = 0.5,
                color = ~pal.dat(votes))  %>%
      clearBounds() %>%
      addLegend("bottomright", pal = pal.dat, 
                values = us.shape.2$votes,
                title = "Democratic two-party vote", 
                bins = 5, opacity = 1, 
                labFormat = labelFormat(suffix = '%', between = ', ',
                transform = function(x) 100 * x))

setView(vote.map, -98.35, 39.50, 4) 

Operating in the R environment

Operating in the R environment

Today's labs and exercises

  • Loading and analysing data in \(R\).
  • We will work getting those of you who are not familiar with \(R\) or the methods we are using up to speed.
  • Visualising results using the ggplot() function.
  • Additional tasks not compulsory. There to help you if you’re not familiar with \(R\).
  • Office hours. Additional workshops.

Contact details